A Variable Selection Procedure for K-Means Clustering
نویسندگان
چکیده
منابع مشابه
Variable Selection for K - Means Quantization
Recent results in quantization theory provide theoretical bounds on the distortion of squared-norm based quantizers (see, e.g., [3] or [10]). These bounds are valid whenever the source distribution has a bounded support, regardless of the dimension of the underlying Hilbertian space. However, it remains of interest to select relevant variable for quantization. This task is usually performed usi...
متن کاملSelection of K in K-means clustering
The K-means algorithm is a popular data-clustering algorithm. However, one of its drawbacks is the requirement for the number of clusters, K, to be specified before the algorithm is applied. This paper first reviews existing methods for selecting the number of clusters for the algorithm. Factors that affect this selection are then discussed and a new measure to assist the selection is proposed....
متن کاملVariable neighbourhood search based heuristic for K-harmonic means clustering
Although there has been a rapid development of technology and increase of computation speeds, most of the real-world optimization problems still cannot be solved in a reasonable time. Some times it is impossible for them to be optimally solved, as there are many instances of real problems which cannot be addressed by computers at their present speed. In such cases, the heuristic approach can be...
متن کاملUnsupervised Feature Selection for the $k$-means Clustering Problem
We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ε ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ε)/ε) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, ...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Korean Journal of Applied Statistics
سال: 2012
ISSN: 1225-066X
DOI: 10.5351/kjas.2012.25.3.471